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Abstract 

Background: A seventh order of nnethanogens, the Methanomassiliicoccales, has been identified in diverse anaerobic 
environments including the gastrointestinal tracts (GIT) of humans and other animals and may contribute significantly 
to methane emission and global warming. Methanomassiliicoccales are phylogenetically distant from all other orders 
of methanogens and belong to a large evolutionary branch composed by lineages of non-methanogenic archaea such 
as Thermoplasmatales, the Deep Hydrothermal Vent Euryarchaeota-2 (DHVE-2, Aciduliprofundum boonei) and the Marine 
Group-ll (MG-II). To better understand this new order and its relationship to other archaea, we manually curated and 
extensively compared the genome sequences of three Methanomassiliicoccales representatives derived from human 
GIT microbiota, "Candidatus Methanomethylophilus alvus", "Candidatus Methanomassiliicoccus intestinalis" and 
Methanomassiliicoccus luminyensis. 

Results: Comparative analyses revealed atypical features, such as the scattering of the ribosomal RNA genes in the 
genome and the absence of eukaryotic-like histone gene otherwise present in most of Euryarchaeota genomes. 
Previously identified in Thermoplasmatales genomes, these features are presently extended to several completely 
sequenced genomes of this large evolutionary branch, including MG-II and DHVE2. The three Methanomassiliicoccales 
genomes share a unique composition of genes involved in energy conservation suggesting an original combination 
of two main energy conservation processes previously described in other methanogens. They also display substantial 
differences with each other, such as their codon usage, the nature and origin of their CRISPRs systems and the genes 
possibly involved in particular environmental adaptations. The genome of M. luminyensis encodes several features to 
thrive in soil and sediment conditions suggesting its larger environmental distribution than GIT. Conversely, "Ca M. 
alvus" and "Co. M. intestinalis" do not present these features and could be more restricted and specialized on GIT. 
Prediction of the omber codon usage, either as a termination signal of translation or coding for pyrrolysine revealed 
contrasted patterns among the three genomes and suggests a different handling of the Pyl-encoding capacity. 
(Continued on next page) 
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(Continued from previous page) 

Conclusions: This study represents the first insights into the genonnic organization and metabolic traits of the seventh 
order of methanogens. It suggests contrasted evolutionary history annong the three analyzed Methanomassiliicoccales 
representatives and provides infornnation on conserved characteristics annong the overall methanogens and among 
Thermoplasmata. 

Keywords: Archaea, Methanomassiliicoccales, Methonomethylophilus, Methonomassiliicoccus, Origin of replication (ORI) 
binding (ORB) motif, Genome streamlining, CRISPR, Pyrrolysine Pyl, H2-dependent methylotrophic methanogenesis, 
Energy conservation 



Background 

Methanogenic archaea are distributed worldwide in an- 
aerobic environments and account for a large proportion 
of methane emissions into the atmosphere, partly due to 
anthropogenic activity {e.g. rice fields and livestock). Over 
the last ten years, sequences of novel archaeal lineages 
distantly related to all orders of methanogens have recur- 
rently been found in diverse anaerobic environments. 
One of these lineages, phylogenetically related to the 
Thermoplasmatales, was first reported in the rumen 
[1,2] and was thereafter referred as Rumen Cluster-C in 
this environment [3]. The methanogenic nature of these 
archaea was subsequently strongly supported by the 
co-occurrence in human stool samples of 16S rRNA 
affiliated to this lineage and mcrA genes (a functional 
marker of methanogens) distantly related to any other 
methanogens [4,5]. The final evidence that they represent 
a new order of methanogens was recently given with the 
isolation of Methanomassiliicoccus luminyenis BIO from 
human feces [6] and the culture in consortia of several 
strains of this order: ''Candidatus Methanomethylophilus 
alvus" [7] and ''Candidatus Methanomassiliicoccus intesti- 
nalis" [8] from human feces samples, MpTl and MpM2 
[9] from termite gut and ''Candidatus Methanogranum 
caenicola" [10] from waste treatment sludge. All the 
culture-based studies agreed on a common methanogenic 
pathway relying on the obligate dependence of the strains 
on an external H2 source to reduce methyl-compounds 
into methane. The restriction to this metabolism was 
previously only observed in two methanogens from 
digestive tract {Methanosphaera stadtmanae and Metha- 
nomicrococcus blatticola) and considered an exception 
[11]. The apparently large distribution of this obligate 
metabolism among this novel order of methanogens 
turns this exception into one of the important pathways 
among the overall methanogens. It also highlights the 
need for a more cautious utilisation of the term of "hydro- 
genotrophic methanogens" which is generally used to refer 
to methanogens growing on H2 + CO2, but also fits for an 
increasing number of described methanogens growing 
on H2 + methyl-compounds. Two names were proposed 
for this order, Methanoplasmatales [9] and Methanomas- 
siliicoccales [10], the latter being now validated by the 



International Committee on Systematics of Prokaryotes 
[12]. For this reason, the name of Methanomassiliicoccales 
will be used in the current publication to refer to this 
novel order of methanogens. 

The global contribution of Methanomassiliicoccales rep- 
resentatives to methane emission could be large, consider- 
ing that it constitutes one of the three dominant archaeal 
lineages in the rumen [3] and in some ruminants it repre- 
sents half or more of the methanogens [13-15]. Using mcrA 
and 16S rRNA sequences, several studies have also 
highlighted the broad environmental distribution of this 
order, not limited to digestive tracts of animals but also 
retrieved in rice paddy fields, natural wetlands, subseaflor 
and freshwater sediments for example [9,10,16,17]. Metha- 
nomassiliicoccales were split into three large clusters, the 
"C<2. M. alvus" cluster, grouping sequences mostly retrieved 
from digestive tract of animals, the M. luminyensis cluster, 
mainly composed of sequences from soils and sedi- 
ments and to a lesser extent from digestive tracts, and the 
Lake Pavin cluster formed by sequences retrieved from di- 
verse environments but not digestive tracts [16]. 

The genome sequences of three different Methanomas- 
siliicoccales members cultured from human stool samples, 
M. luminyensis BIO [18], ''Ca. M. intestinalis Mxl-Issoire" 
[8] and "Ca. M. alvus Mxl201" [7], have recently been 
made available [19]. M. luminyensis shows 98% identity 
with ''Ca. M. intestinalis" over the whole 16S rRNA 
gene and only 87% with ''Ca. M. alvus". According to 
the environmental origin of the sequences constituting 
the large cluster to which they belong, M. luminyensis 
and "C<2. M. intestinalis" might be more recently adapted 
to gut condition than ''Ca. M. alvus". Moreover the 
important difference in genome size and [G + C] % 
content between the two Methanomassiliicoccus spp. 
genomes suggests a rapid evolution of one of them in 
response to its adaptation from soil or sediment to digestive 
tract conditions [8]. Despite the important phylogenetic 
distance between ''Ca. M. alvus" and the Methanomassi- 
liicoccus spp., these genomes uncover common unique 
genomic characteristics. In particular, the analysis of ''Ca. 
M. alvus" and M. luminyensis methanogenic pathways 
revealed they lack the 6 step Ci -pathway forming methyl- 
CoM by the reduction of CO2 with H2, otherwise present 
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in all previously sequenced methanogens, fitting with 
their restriction to H2-dependent methylotrophic meth- 
anogenesis [16]. Moreover, these analyses helped define 
putative alternative substrates to methanol by identifica- 
tion of genes involved in the use of methylated-amines 
and dimethyl-sulfide. Methylated-amines utilization by 
Methanomassiliicoccales representatives has also been 
proposed in a metatranscriptomic study on rumen metha- 
nogens [17]. The use of tri-, di- and monomethylamine, 
with the obligate dependence on H2, has subsequently 
been validated in vivo with M. luminyensis [20]. This 
property could be significant for human health since 
gut-produced TMA could be implied in two different 
diseases [19-22]. The presence of pyrrolysine (Pyl, O), 
the 22"^ proteinogenic amino acid, is associated to this 
metabolism as it is incorporated in methyltransferases 
involved in utilization of methylated-amines through an 
amber codon suppression by a Pyl-tRNA [23,24]. All 
the necessary genetic machinery is found in the three 
genomes of the Methanomassiliicoccales, including the 
genes for pyrrolysine synthesis (pylBCD), the amber 
suppressor tRNA^^^ (pylT) and the dedicated amino-acyl 
tRNA synthetase (pylS). Their structure and unusual 
features, together with the evolutionary implications of 
this system have been recently described elsewhere [25]. 

These original metabolic and genetic characteristics, as 
well as the closer phylogenetic proximity of this order 
with Thermoplasmatales than other orders of methanogens 
prompted us to perform a more comprehensive analysis 
of these three genomes. We provide here their general 
characteristics, including comparisons to phylogenetic 
neighbor genomes, and derived potential metabolism and 
adaptation to environmental conditions from their gene 
composition. In the particular context of the missing 
genes of the CO2 reduction-pathway otherwise shared by 
all other methanogens, we reevaluate the global core of 
enzymes that are unique and specific to all methanogens 
and highlight the atypical composition of genes likely 
involved in energy conservation. The potential usage of 
the amber codon as a translational stop signal or encoding 
a Pyl in proteins was analyzed and suggests a differential 
handling of the Pyl-encoding capacity among the three 
Methanomassiliicoccales representatives. 

Results and discussion 

General genomic features 

Genome size, [G + C] %, CDS and tRNA numbers were 
separately reported in the announcement of these 
genomes [7,8,18]. Data are gathered in the Table 1 with 
other newly defined general features. 

The tRNA gene complement present in the genomes is 
in part redundant and covers the usual 20 amino acids, 
with the exception of Lys in M. luminyensis, for which no 
tRNA was detected: this amino acid is likely encoded 



in the remnant -17 kbp from this genome which are 
currently not available (Table 1). An archaeal complete 
set of amino-acyl tRNA synthetases is found in all three 
genomes, Asn- and Gin- tRNAs being obtained by an 
Asp-/Glu- tRNA (Asn/Gln) amidotransferase [26]. As pre- 
viously described [25], an important feature is the presence 
of a tRNA^^^ in all the three genomes. Several small non- 
coding RNAs (ncRNAs, complete list in Additional file 1: 
Table SI) were detected. Among them are found a Group 
II catalytic intron (only in ''Ca, M. alvus"), the RNA compo- 
nent of the archaeal signal recognition particle (aSRP RNA) 
and the archaeal RNAse P. 

Strikingly, 16S and 23S rRNA genes are not clustered 
and do not form a transcriptional unit as found in most 
bacterial and archaeal genomes. Among archaea, this 
unusual characteristic was first documented in Thermo- 
plasmatales [27], but is also found in related lineages such 
as the uncultured Marine Group II (MG-II) and Aciduli- 
profundum boonei (Figure 1). This particular organization 
of the rRNA genes is consistent with the phylogenetic 
position of the seventh order of methanogens determined 
using a concatenation of ribosomal proteins [16] and con- 
stitutes a distinctive characteristic of Thermoplasmatales 
and related lineages. On a practical point of view, this also 
indicates that the Ribosomal Intergenic Transcribed 
Spacer Analysis, recently proposed as a tool to study 
the diversity of the methanogenic archaea in digesters 
[28], will likely fail to detect the Methanomassiliicoccales 
representatives. 

As previously reported [8], the three genomes show 
significant size heterogeneity, with a variation of 58% 
(from around 1.7 Mbp to 2.6 Mbp, Table 1). Such hetero- 
geneity is found even within the same genus with 36% size 
variation between the genomes of ''Ca, M. intestinalis" 
and M, luminyensis (1.9 to 2.6 Mbp). The number of 
genes is highly variable and ranges from 1,705 i^'Ca, M. 
alvus") to 2,713 (M luminyensis). The average CDS size 
and gene density is very close among the three genomes 
(around 900 bp and a protein coding gene every 984 
to 1,054 bp). The main translation initiation codon is 
methionine (AUG) for which two copies of the corre- 
sponding tRNA are detected in ''Ca, M. alvus" and three 
copies in "Ca, M. intestinalis" and M, luminyensis. In a 
lower extent, GUG and UUG are also found as transla- 
tion start codons (Additional file 1: Table S2). Nucleotide 
composition [G + C] % ranges from 41.3% to 60.5% 
(Table 1) [7,8,18]. Codon usage patterns among CDS 
primarily reflect this [G + C] % variation, ''Ca, M. intes- 
tinalis" primarily using AT-rich codons for a given amino 
acid (Additional file 1: Table S2). Two of the three stop 
codons follow this usage pattern, the ochre codon UAA 
accounts for 45% of the stop codons in the genome of 
''Ca, M. intestinalis" and respectively only 17% and 14% in 
the genomes of "Ca. M. alvus" and M, luminyensis and a 
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Table 1 Genome statistics 



Feature 


"Ca. M. alvus" 


"Ca. M. intestinalis" 


M. luminyensis 


Genome size^ 


1,666,795 


1,931,651 


2,637,81 0"^ 








(2,620,233) 


DNA G + C content 


55.6% 


41.3% 


60.5% 


% DNA coding region 


89.5% 


88.4% 


87.6% 


Intergenic regions mean size (SD)'^ 


102 (175) 


119 (264) 


121 (238) 


Genes mean G + C content 


56.3% 


42.4% 


61.0% 


Putative replicons 


K+D' 


K+D' 


1 (+1)' 


Extracliromosomal elements 


NA' 


NA' 


NA' 


Total genes 


1,705 


1,882 


2,713 


RNA genes 


52 


50 


52 


rRNA genes (5S-16S-23S) 


4 (2 - 1 - 1) 


4 (2 - 1 - 1) 


4 (2 - 1 - 1) 


tRNA genes 


48 


46 


48 


Protein coding genes 


1,653 


1,832 


2,661 


Mean size of protein coding genes (SD)^ 


901 (667) 


930 (890) 


859 (676) 


Median size of protein coding genes^ 


771 


780 


732 


Gene products with function prediction 


1,335 


1,476 


2,002 


Gene products assigned to arCOGs 


1,271 


1,438 


2,065 


Gene products assigned Pfam domains 


123 


125 


204 


Gene products with signal peptides 


247 


336 


512 


Gene products with transmembrane helices 


281 


389 


585 


CRISPR repeats 


1" 


1 


1 



^Sizes are given in bp. 

"^Presence of two different cclc6 genes per genome. See tlie text for more information. 
^Not available. 

""Data from [8]: in bracket stands the total bp (26 contigs) available from database [GenBank: CAJE01 000001 to CAJE01 000026], analyzed 
^Presence of CRISPR repeats split into two neighboring loci (see Additional file 1: Table S3) surrounding a DNA sequence containing one 
putative transposase. 



in this study, 
gene encoding a 



same trend is observed for opal codon UGA (Additional 
file 1: Table S2). However, a different pattern is observed 
for the amber codon UAG and could be the result of a 
different selection process (see dedicated section on 
amber codon usage and putative Pyl-containing proteins). 
All the ribosomal RNA genes of the three genomes have a 
[G + C] % above 50%. In "Ca. M. intestinalis", they thus 
have a largely higher [G + C] % than the genome average. 
When compared to M. luminyensis ^ general characteristics 
of the ''Ca. M. intestinalis" genome suggest streamlining 
accompanied by a sharp [G + C] % reduction as previously 
observed in free-living Prochlorococcus [29]. This potential 
genomic evolution could be related to the recent 
colonization of digestive tract by ''Ca. M. intestinalis" 
from soil or sediment environments. 

CRISPR elements 

The CRISPR system confers to prokaryotes a highly adap- 
tive and heritable resistance to foreign genetic elements 
such as plasmids and phages [30-33]. CRISPR loci are com- 
posed of genome-specific conserved Direct Repeats (DRs) 
separated by small sequences (spacers) which constitute a 



record of past infections. CRISPR-associated (Cas) proteins 
are responsible for integration of new spacers borrowed 
from invasive DNA and use the small antisense RNA 
transcript of these spacers to protect the cell from new 
invasions. CRISPR loci were previously notified in the 
three Methanomassiliicoccales genomes [7,8,18] and are 
characterized in the present study. The CRISPR DRs are 
concentrated in one genomic unit in "Ca. M. intestinalis" 
and M. luminyensis but are interrupted by a gene encod- 
ing a putative IS4-type transposase (AGI85628.1) in "Ca. 
M. alvus". The DRs of the three genomes differ from each 
other in length (31 and 36 bp. Additional file 1: Table S3), 
sequence and associated 2D-structure (Additional file 2: 
Figure SI), and belong to three different superclasses. A 
CRISPR map analysis [34] attributed the M. luminyensis 
DRs to the superclass D, family 3 and the "Ca. M. intesti- 
nalis" DRs to the superclass A (no family) with a partial 
motif #27 which is exclusively shared with Methano- 
coccales sequences (from Methanothermus okinawensis, 
Methanocaldococcus jannaschii and Methanocaldococcus 
fervens, Additional file 2: Figure SI). The "Ca. M. alvus" 
DRs (ATCTACACTAGTAGAAATTCTGAATGAGTTTT 
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4^ 



I Methanosarcinales 
I Methanocellales 
I Methanomicrobiales 
I Halobacteriales 
ANME-1 

I Archaeoglobales 



B 




M ETH ANOM ASSILIICOCCALES 
Uncultured marine group II 
Aciduliprofundum boonei 
Thermoplasmatales 





Methanobacteriales 
Methanopyrales 
I Methanococcales 
Thermococcales 




Figure 1 Genomic features of ribosomal genes in Euryarciiaeota. (A) Phylogeny of Euryarchaeota highlighting the position of the 
Methanomassiliicoccales (according to [16]). The seven orders of methanogens are in red. (B) Genomic organization of ribosomal genes in Euryarchaeota: 
5S, 16S and 23S rRNA genes are symbolized by blue, green and orange arrows, respectively. They are indicated irrespectively of the (+) or (-) DNA strand 
carrying them. A plain line defines an operon organization where tRNAs (when present) are not shown, nor the number of genes encoding rRNA with the 
exception of the Methanomassiliicoccales. The 5S rRNA gene in bracket refers to a second 5S rRNA copy isolated from the 16S-23S-5S rRNA gene operon 
in Methanococcus maripaludis C5. 



AGAC, superclass E) could not be classified in any 
sequence/structure family and likely represents a new 
family of CRISPR DR elements. The number of spacers 
within DRs ranges from 12 to 113 per locus (from 59 to 
113 per genome). Each spacer has a particular size range, 
from 25 to 28 bp in "Ca. M. alvus" to 35 to 40 bp in M. 
luminyensis (Additional file 1: Table S3). A few other 
CRISPR-like elements are also found in as many as 
three copies and their functional role remains unknown 
(Additional file 1: Table S3). 

According to the CRISPR system classification proposed 
by Makarova et al [35] on the basis of organization and 
composition of the Cas protein-coding genes found in the 
neighborhood of the CRISPRs, M, luminyensis presents a 
CRISPR-Cas system subtype I-C (WP_019177384.1 to 
WP_019177390.1). The CRISPR-Cas system of "C^. M. 
intestinalis" is a hybrid of the subtypes I-A and I-B since 
its organization corresponds to subtype I-B, but contains 
the signature gene of the subtype I-A (CasSa) (AGN26276 
to AGY50180.1). The recently defined PreFran subtype 
(for Prevotella and Francisella) is present in ''Ca. M. 
alvus" (AGI85629 to AGI85632). Notably, the Casl pro- 
tein of ''Ca, M. alvus" is predicted to contain a pyrrolysine 



(see section on amber codon usage and putative Pyl- 
containing proteins). 

As suggested by the different superclass assignments 
of the repeats and the different types of CRISPR-Cas 
system, these CRISPRs likely result from non-vertical 
inheritance among the three species. The PreFran type, 
only found in 20 bacterial genomes so far is rather 
uncommon in comparison to the type I of the Metha- 
nomassiliicoccus spp. Bacteria that hold the PreFan type 
are generally found in tight association with animals 
and the genus Prevotella is one of the dominant in 
rumen [36] and human gut [37] suggesting that ''Ca. 
M. alvus" may have acquired this system through other 
gut bacteria. Moreover, the spacers are specific to each 
of the three genomes suggesting they undergone differ- 
ent histories of infection. In ''Ca, M. alvus", one of the 
spacers is 93% similar (25 of 27 nt) to a ssDNA virus 
isolated from pig feces (JX305998.1). 

With the exception of viruses from the families of 
Myoviridae and Siphoviridae (head-tail viruses) which 
also infect bacteria, archaeal viruses sequenced to date 
have almost no significant residue identity with each other 
and sequences in public databases [38,39]. Accordingly, 
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the lack of detection of prophage sequences by dedicated 
software does not imply the absence of prophages in these 
three genomes: some clusters of 10-30 adjacent genes 
with few significant matches in public databases might 
represent still unknown prophages. Furthermore, genes 
distantly related to phage ones are found in the three 
genomes and could belong to unknown prophages or 
represent residual traces of past infection. This is for 
example the case of two contiguous genes, present in the 
vicinity of the "Ca. M. intestinalis" CRISPR locus, which 
encode putative proteins (YP_008071639.1 & YP_0080716 
40.1) with similarity to phage capsid synthesis proteins. 

Genome replication 

Origins of replication were identified with a consensus 
Origin Recognition Box (ORB) motif recently identified 
from active replication origins of Thaumarchaeota {Nitro- 
sopumilus maritimus), Crenarchaeota and Euryarchaeota 
[40]. Several ORB motifs were found in the three genomes, 
most of them gathered by pairs (Table 2). A consensus 
sequence for a Methanomassiliicoccales ORB motif was 
deduced and shows little difference with the archaeal 
consensus recently proposed [40] (Table 2). 

Each of the three genomes possesses two copies of the 
orcl/cdc6 (Origin Recognition complex/Cell division 
cycle 6) gene (Table 3). At least two ORB motifs are 
found in the vicinity of only one of the two orcllcdc6 
genes. In the draft genome of M, luminyensis, these two 
genes are associated in the same contig (CAJEO 1000021), 
allowing comparison with the other two genomes. In 
every case, the orcllcdc6 genes are each located on a 
different strand (Additional file 2: Figure S2). They are 
close together within the M, luminyensis and ''Ca, M. 



intestinalis" genomes (respectively around 70 and 90 kbp), 
and more distant in ''Ca, M. alvus" (around 695 kbp). 
They are inversely oriented in the three genomes. Consist- 
ent with a recent study [41], phylogenetic analysis reveals 
that these genes correspond to two paralogs, orcl/cdc6,l 
and orcllcdc6,2 (Additional file 2: Figure S3). orcl/cdc6,l 
lies close to the predicted origin of replication, displays a 
conserved genomic context (Figure 2) is slow-evolving 
and groups phylogenetically with Thermoplasmatales/ 
DHVE2/uncultured Marine Group II (Additional file 2: 
Figure S3), consistent with vertical inheritance. On the 
other hand, orcl/cdc6,2 copies display much faster evolu- 
tionary rates, lies in a non-conserved genomic context 
(Figure 2), and show inconsistent phylogenetic place- 
ment close to Crenarchaeota (Additional file 2: Figure S3). 
This may be due to a tree reconstruction artifact or may 
represent a possible horizontal gene transfer from an 
unspecified crenarchaeon. Given its higher conserva- 
tion, its conserved genomic context and its vicinity to 
ORB motifs, Orcl/Cdc6.1 is likely the main initiator 
protein and Orcl/Cdc6.2 may represent an inactive or 
accessory copy, possibly active in different environmen- 
tal conditions. 

The replication gene set is similar to that of the most 
closely related lineages (Table 3). However, some inter- 
esting features are present in the three genomes. For 
example, they do not harbor any homologs of the 
single-stranded binding protein SSB similarly to MG-II, 
whereas Thermoplasmatales and DHVE2 have both RPA 
and SSB. The absence of SSB may strengthen the sister 
relationship of the Methanomassiliicoccales and MG-II 
lineages as observed in a phylogenetic reconstruction based 
on ribosomal proteins [16]. The Methanomassiliicoccales, 



Table 2 ORBs motifs found in the Methanomassiliicoccales genomes 



ORB 


Sequence 


Position 


Spacing 


Orientation 


"Co. M. alvus" ORBl 


GTTCCAGTGGAAATGG-TGGGGT 


78-99 


39 


inverted 


"Co. M. alvus" 0RB2 


GTTCCACTGGAAACAG-AGGGGT 


138- 159 




inverted 


"Co. M. alvus" ORBS 


nrCCACTGGAAACAG-AGGGGT 


1977 - 1998 


47 




"Co. M. alvus" 0RB4 


GTTCCACTGGAAATGG TGGGGT 


2045 - 2066 






"Co. M. intestinalis" ORBl 


ATTACAGTGGAAATGA-AGGGGT 


15-36 


256 


inverted 


"Co. M. intestinalis" 0RB2 


rrTGCAGTGGAAATGA-AGGGGT 


292 - 313 






"Co. M. intestinalis" 0RB3' 


GTTCCAGTGGAAATGA-AGGGGT 


795626 - 795647 






"Co. M. intestinalis" 0RB4'' 


TCTGCACTGGAAATGA-AGGGGT 


1576211 -1576232 




inverted 


M. luminyensis ORBl 


GTTCCAITGGAAATCG-GCAGGA 


73488 - 73475^ 


113 




M. luminyensis 0RB2 


GTTCCAGTGGAAATAA-AGGGGT 


73341 - 73362^ 




inverted 


Methanomassiliicoccales 
consensus ORB 


GTTCCAGTGGAAATGG-AGGGGT 
A 








Archaea consensus ORB 


CTTCCAGTGGAAACGAAAGGGGT 









Comment 



downstream orcl/cdc6.1 
downstream orcl/cdc6.1 
upstream orcl/cdc6.1 
upstream orcl/cdc6.1 
downstream orcl/cdc6.l 
downstream orcl/cdc6.l 
downstream fstZ 
downstream fused nifH/nifE 
downstream orcl/cdc6.1 
downstream orcl/cdc6.l 



Pelve et ol., [40] 



Bases in bold indicate consensual bases of the ORB sequence in the Methanomassiliicoccales. The "Ca. M. alvus" ORBs, and the 0RB2 of M. luminyensis and "Ca. M. 
intestinalis" might be extended by a "GGGGGT" sequence otherwise not conserved In the 4 other Methanomasslllloccales ORBs and the Archaea consensus ORB. 
^Not found In close association to another ORB. 
■^Contig [GenBank: CAJE01 000021.1]. 
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Table 3 DNA replication proteins compared to the corresponding components in Thermoplasmatales, MG-II and DHEV2 





"Ca. M. alvus" 


"Co. M. intestinalis" 


M. luminyensis 


MG-II DHEV2 Thermoplasmatales 


ATP-dependent DNA ligase 


AGI85913 


AGN25909 


WP_01 91 76428 


X ■ ■ 


Orcl/Cdc6 


AGI84758 (1) 


AGN25419 (1) 


WP_01 91 78385 (1) 


■ ■ ■ ■■ 




AGI85775 (2) 


AGN27158 (2) 


WP_019178317 (2) 




DNA Pol D large subunit (DPL) 


AGI85099 


AGN26720 


WP_01 91 77373 


■ ■ ■ 


DNA Pol D small subunit (DPS) 


AGI84772 


AGN27082 


WP_01 91 78373 


■ ■ ■ 


FEN-1 


AGI85207 


AGN26626 


WP_01 91 76843 


■ ■ ■ 


GINS 51 


AGI84890 


AGN27100 


X 


X ■ ■ 


GINS 23 


X 


X 


X 


XX X 


DNA Gyrase subunit B 


[AGI86382] 


[AGY50228] 


[WP_01 91 78436] 


[■] [■] [■] 


DNA Gyrase subunit A 


[AGI86381] 


[AGN27159] 


[WP_01 91 78437] 


[■] [■] [■] 


MCM 


AGI86392 


AGN26346 


WP_019178416 


■ ■ ■ 






AGN27203 






PCNA 


AGI84935 


AGN27068 


WP_019176118 


■ ■ ■ 


DNA Pol B 


AGI86264 


AGN26701 


WP_01 91 77962 


■ ■ ■ 








WP_019177491 




Primase large subunit (PriL) 


AGI84820 


AGN27177 


WP_01 91 78297 


■ ■ ■ 


Primase small subunit (PriS) 


AGI86400 


AGY50234 


WP_01 91 78400 


■ ■ ■ 


RFC large subunit 


AGI85559 


AGN26596 


WP_01 91 76873 


■ ■ ■ 


RFC small subunit 


AGI85778 


AGN26166 


WP_01 91 77244 


■ ■ ■ 


RNaseH II 


AGI86158 


AGN25790 


WP_01 91 77553 


■ ■ ■ 


TopoVI subunit A 


AGI85998 


AGN26743 


WP_01 91 77592 


■ ■ X 


TopoVI subunit B 


AGI85997 


AGN26742 


WP_01 91 77591 


■ ■ X 


Topo IB 


X 


X 


X 


XX X 


SSB 


X 


X 


X 


X ■ ■ 


RPA2 


AGI84916 


AGN25568 


WP_01 91 78149 


■ ■ ■ 






AGY50184 


WP_01 91 77069 




rpa2A (rp associated protein) 


AGI84915 


AGN25567 


WP_019178150 


■ ■ ■ 


NAD-dependent DNA ligase 


[AGI85455] 


X 


X 


[■] X X 


Proteins in brackets indicate liorizontal transfers from bacteria; Proteins in italics indicate fast evolving additional copies likely representing decaying paralogs, 
genes horizontally transferred among archaea, or homologs arising from integration of foreign elements. Absent proteins (or unavailable due to genome 
incompleteness) are indicated by an X. (1) and (2) in front of the Orc1/Cdc6 protein accession numbers indicate the Orc1/Cdc6.1 and Orc1/Cdc6.2, respectively. 



Marine Group II, and DHVE2 harbor both subunits of the 
archaeal topoisomerase TopoVI, strengthening a specific 
loss of this gene in Thermoplasmatales, which replaced it 
by a bacterial-type DNA gyrase [42]. Moreover, all three 
Methanomassiliicoccales representatives also harbor a 
bacterial-like DNA gyrase, known to have been acquired 
from bacteria in late emerging Euryarchaeota [41]. Some 
components are present as extra copy in the three genomes 
(in bold in Table 3), for example the Minichromosome 
Maintenance Protein (MCM) in the genome of "Ca. M. 
intestinalis", which is highly divergent with respect to the 
other MCM coding genes and lies in a genome region with 
no synteny with the other closely related genomes. This is 
also the case for an extra PolB coding gene identified in the 
genome of M. luminyensis. Finally, genes coding for two 
additional OB-fold containing proteins (RPA-like) were 



identified in the genomes of "Ca. M. intestinalis" and M. 
luminyensis. All these extra copies are very divergent and 
likely represent decaying paralogs or homologs arising from 
integration of foreign elements. In addition, we found a 
bacterial type NAD-dependent DNA ligase homolog in 
the genome of "Ca. M. alvus" that appears to originate 
via a specific and recent horizontal gene transfer from a 
bacterium of the Prevotella genus, which is abundant in 
the human gut microbiota (Additional file 2: Figure S4A). 

An important feature shared by the three Methano- 
massiliicoccales representatives, the Thermoplasmatales 
and other related lineages is the lack of Eukaryotic-like 
histone found in other Euryarchaeota [43], suggesting 
that the loss of this gene occurred early in the evolution 
of the whole lineage. Surprisingly, no gene coding for 
homologues of the bacterial-type HTa histones known to 
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A 

"Ca. M. alvus" 
M. luminyensis 



Cell OB-fold nucleic OfCl/ 

Prefoldin division Acetylornithlne 3-ketoacyl- acid binding CclC6 1 Glutathione Signal dCMP ATP synthase 

(a SU) protein FtsY Hypoth. deacetylase CoA thiolase domain protein Hypoth. ' peroxidase peptidase I deaminase operon ahaA-H 



GTP-binding 
Hypoth. protein HFLX 



binding EF1- NADP dependent Uncharacterized Fe-S 



"C3. M. intestinalis" 




B 

"C3.M. alvus" 
M. luminyensis 
''Ca. M. intestinalis" 



Replication 
Phenylacetate- factor C small YaaH-like 



Orel/ 



AAAATPase Hypoth. Hypoth. coenzyme A ligase subunit protein Hypoth. UuCO.^ protein protein 



PP-loop Radical SAM Cysteinyl-tRNA 



Ribosomal proteins 
L1p L10p L12ae Hypoth. 



Sulfopyruvate 
Cysteate decarboxylase 
synthase a SU SU Hypoth. Hypoth. 



Extremity of the contig 
CAJE01 000021.1 



CrcB-like phosphomethylpyrimidine 
protein synthase ThiC DNA gyrase subunit B 




DNA gyrase subunit A 



CrcB-like CrcB-like 
Excinuclease ABC subunit A protein protein 



Figure 2 Genomic regions surrounding the orcl/cdc6.1 (A) and orcl/cdc6.2 (B) genes in the three genomes of Methanomassiliicoccales. 

Each homologous gene {i.e. showing more than 30% amino acid identity and an e-value < 10"^ when analyzed by blast against each other) from the 2 
regions of the 3 genomes is colored differently and connected with shading. The black arrows represent genes involved in the replication process. The 
grey arrows represent other genes of various function with no homologue detected on the corresponding region of the 2 other genomes. "Hypoth." 
refers to genes encoding hypothetical proteins. 



have replaced the native histone in Thermoplasmatales 
and DHVE2 are present in the Methanomassiliicoccales 
genomes and the MG-II genome. The DNA packaging 
function could be fulfilled in M. luminyensis by an Alba 
protein (WP_019176109.1) also presents in Thermoplas- 
matales [44] and MG-II, but absent in ''Ca, M. alvus" 
and ''Ca, M. intestinaUs". Few candidate proteins with a 
very weak similarity to bacterial histones and a Lys- 
and Arg-rich tail were identified in M, luminyensis 
(WP_019177894.1) and "Ca, M, intestinalis" (AGN268051) 
but not in "Ca, M. alvus". While the proteins responsible 
for this crucial function remain elusive, a homologue of the 
histone acetyltransferase of the ELP3 family was identified 
in the three genomes (WP_019178580.1, AGN27049 
and AGI86364). Only M, luminyensis possesses a histone 
deacetylase Hdal, related to Crenarchaeota and not found 
in other Thermoplasmatales (WP_019177579.1). 

Core genome 

The best BLAST hits of the CDS from the three genomes 
were most frequently found in other archaeal members 
(70% to 82%), around 18% to 30% to Bacteria, and less 
than 0.3% to Eukaryota (Additional file 1: Table S4). It 
is likely that some of these reflect lateral gene transfer 
events, consistent with the presence of genomic islands 
with different [G + C] % composition from the genome 
average, as observed in "Ca, M. alvus" and, more 



pronounced, in "Ca, M. intestinalis" (Additional file 2: 
Figure S2). 

The core genome of the three species is composed of 
658 CDS. While the number of CDS shared between 
genome pairs reflects partly their phylogenetic relatedness, 
an impressive proportion of CDS are speciflc to each one, 
in particular for M, luminyensis (Figure 3, Additional file 1: 
Table S5 for a complete list). Of the core genome, 173 
genes are not found in the closest lineages (Ferroplasma 
acidarmanus, Thermoplasma acidophilum, Thermoplasma 
volcanium, uncultured Marine Group II and Aciduliprofun- 
dum boonei (Table 4, complete data in Additional flle 1: 
Table S5). A part of these genes could correspond to 
specific traits of the Methanomassiliicoccales, at least 
for 20 of them which have no close homologue sequence 
in the databases (Additional file 1: Table S5). Another part 
of these genes reflects the metabolic pathway of the 
Methanomassiliicoccales representatives, methanogenesis, 
not shared with the Thermoplasmatales and any of the 
other related lineages for which genomic or physiological 
data are available. As discussed below, some of these genes 
are unique to methanogens. Among the predicted core 
proteins, 227 have no homologues in the two other 
methanogens commonly found in the same environment, 
the human gut {Methanobrevibacter smithii and Metha- 
nosphaera stadtmanae). Some of these differences rely on 
the particular methanogenic pathway of the Methanomas- 
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carbohydrate metabolism (glycosyl transferases, sugar 
transporters), nitrogen metabolism, and several proteins 
specific to the Methanosarcinaceae and involved in meth- 
anogenesis (see below). 

General metabolism and adaptations to environment 

Analysis of archaeal clusters of orthologous groups 
(ArCOG [45]) resulted in 1,271; 1,438 and 2,065 
assigned functions for "Ca. M. alvus", "Ca. M. intestinalis" 
and M. luminyensis respectively (representing between 
77-79% of all CDS) (Additional file 1: Table S7). Compo- 
nents of cell wall/membrane and envelope biogenesis 
(class M) were less abundant when compared to the 
other gut methanogens M, smithii and M, stadtmanae. 
Indeed, comparatively to these Methanobacteriales, elec- 
tron micrographies of M, luminyensis did not show a 
prominent cell-wall-like structure [6]. However, it seems 
that the synthesis of activated mannose is likely possible 
from fructose-6-P, therefore allowing the biosynthesis of 
N-glycans potentially associated to a cell-wall A specific 
enrichment was observed for inorganic ion transport and 
metabolism (class P) and, as noted for other methanogens, 
for coenzyme transport and metabolism (class H): when 
analyzed in more details, many of the predicted trans- 
porters are ABC transporter permease proteins with 
homology to those identified in other methanogens 
(Additional file 1: Table S8). Noteworthy is the presence 
of quaternary ammonium compound efflux pumps as 
well as specialized systems involved in substrate acqui- 
sition for speciaUzed methanogenesis-related functions 
(H2-dependent methylotrophic methanogenesis, see below): 
this includes putative transporters for dimethylamine 
(AGI85872T/AGI85374.1/AGI85246.1 for "Ca. M. alvus", 
AGN26255.1 for "Ca. M. intestinalis", WP_019178528.1 
for M, luminyensis) and trimethylamine (AGI85867.1, 



Table 4 Proteome of the three Methanomassiliicoccales representatives compared to their phylogenetic neighbors, 
human gut methanogens and NCBI nr proteins 



Core genome of Methanomassliilcoccaies: 658 protein sequences 


Specific^ 


Shared^ 


Phylogenetic neighbors 


173 


485 






125 absent from human gut methanogens 


Human gut methanogens 


227 


431 






63 absent from phylogenetic neighbors 


Phylogenetic neighbors and human gut methanogens 


102 


556 shared with at least one 






Encompassing: 






125 absent from human gut methanogens 






71 absent from phylogenetic neighbors 






360 shared with the two groups 


NCBI non-redundant protein sequences database 


20 (21 f 


637 



^Number of deduced proteins of the core genome of Methanomassiliicoccales that are not found in the corresponding organisms. 
"^Number of deduced proteins of the core genome of Methanomassiliicoccales that are also found in the corresponding organisms. 

'^The value of 21 encompasses CDS that are specific of the proteome of the Methanomassiliicoccales together with either the ones of the phylogenetic neighbors 
or of the human gut methanogens, without any other blast hits with the NCBI nr protein sequences database. 




Figure 3 Shared and unique CDS among the three genomes. 

Venn diagram indicating the core genome at its center, deduced 
from a BLAST analysis of the CDS from the 3 genomes of the 
Methanomassiliicoccales. Unique and shared CDS among genome 
pairs are also given. 

v J 



siliicoccales which can use methylated amines as sub- 
strate [20], which is not the case of M. smithii and M. 
stadtmanae. One hundred and two core proteins have 
no homologues in either the closely related lineages or 
the two gut methanogens (Table 4, complete data in 
Additional file 1: Table S6). Some show hits to other 
methanogens (Methanocellales, Methanomicrobiales and 
Methanosarcinales), and are specific for methanogenesis/ 
energy conservation. Others likely reflect ancient lateral 
gene transfer events (LGTs) in the ancestor of the Metha- 
nomassiliicoccales. They include proteins involved in 
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AGN26256.1, WP_019178522.1). The following part of 
the section focuses on several genomic features of the 
three Methanomassiliicoccales representatives that sug- 
gest metabolic adaptations to their environment. An 
overview of the inferred general metabolism is given in 
Additional file 2: Figure S5. As usually observed in 
methanogens, the three species harbors an incomplete 
reductive TCA cycle [46] . Further details on lipid, amino- 
acid and purine synthesis pathways, as well as molecular 
nitrogen fixation are also presented in Additional file 3. 

Similarly to other methanogens and differently from the 
Thermoplasmatales representatives, the three Methano- 
massiliicoccales lack PurK for purine synthesis pathway. 
Two purE-like enzymes were identified (AG 184793.1, 
AGI85002.1, AGN25661.1, AGN26431.1, WP_01917835 
1.1, WP_019177087.1) without clear assignment to class I 
or class II PurE (Additional file 3). Depending on the 
assignment of these PurE, the ATP-dependent activity 
of PurK might be substituted by a class I PurE in pres- 
ence of high concentration of CO2 or a class II PurE, 
both avoiding the hydrolysis of ATP [47]. The former 
possibility could represent an adaptation to the high CO2 
concentrations in anaerobic environments as proposed for 
other methanogens [47]. 

Two possible sources of ammonia are predicted to be 
common in the three Methanomassiliicoccales, a direct 
uptake from the environment by dedicated transporters 
(Additional file 1: Table S8) and an intracellular produc- 
tion, as a by-product of methanogenesis from monomethy- 
lamine. The presence of some of these transporters in 
close association to the genes involved in methanogenesis 
from monomethylamine suggests that they could alterna- 
tively be used to export ammonium when monomethyla- 
mine is used for methanogenesis. Ammonia could also 
be derived from urea in "Ca, M. intestinaUs" which pos- 
sesses a ureA'G operon encoding a urease (AGN27148.1 
to AGN27154.1) and a urea transporter (AGN27055.1). 
Ammonia is likely assimilated by a glutamine synthetase 
GlnN, one in "Ca, M. alvus" and "Ca, M. intestinalis" 
(AGI86325.1; AGN25771.1) and two in M. luminyensis 
(WP_019177566.1; WP_019177539.1, this second one likely 
acquired through LGT from bacteria). M, luminyensis is 
predicted to be diazotroph with a putative flexibility upon 
the dependency on Molybdenum, while ''Ca, M. alvus" and 
''Ca, M. intestinalis" probably lack the capacity to fix N2 
(Additional file 3). N2 fixation capacity has been found 
among soil and sediment methanogens but not in common 
gut methanogens (Additional file 3) [48-50]. Accordingly, 
the potential capacity of M, luminyensis to fix N2 could 
reflect an adaptation to soil or sediment conditions and 
a facultative association to digestive tracts. 

Each Methanomassiliicoccales genome encodes at least 
one catalase {katE), peroxiredoxin (prx), rubredoxin {rub) 
and rubrerythrin {rbr) to resist to oxygen exposure 



(Additional file 1: Table S9). M. luminyensis presents the 
highest antioxidant capacity, in particular with 8 copies 
of a peroxiredoxins {prx) gene, against 4 and 2 copies 
in ''Ca, M. intestinalis" and ''Ca, M. alvus" respectively. 
M, luminyensis is also the only one to harbor homologues 
of superoxide dismutase {sodA) and desulfoferrodoxin 
{dfx), A large diversity and redundancy of the antioxidant 
systems was previously reported for dominant rice field 
soil methanogens, Methanocellales, and described as a 
specific adaptation of these methanogens to oxic episodes 
regularly occurring in these environments [48,49]. In line 
with its probable diazotrophic capacity, the larger number 
and diversity of genes encoding antioxidant enzymes in 
M, luminyensis argue for a greater adaptation to soil 
environments than "Ca, M. alvus" and ''Ca, M. intestinalis". 
A glycine-betaine ABC transporter (WP_019176328.1, 
WP_019176329.1, WP_0 19 176330.1) was also found in M. 
luminyensis. This kind of transporter helps to cope with ex- 
ternal variations in salt concentration by accumulating 
glycine-betaine as an osmoprotectant and was previ- 
ously identified in Methanosarcinales [51,52]. No simi- 
lar transporter of glycine-betaine was identified in ''Ca, 
M. alvus" or "Ca. M. intestinalis". 

Interestingly, among the three Methanomassiliicoccales 
representatives, ''Ca, M. alvus" is the only one to encode a 
choloylglycine hydrolase (YP_007713843.1), which confers 
resistance to bile salts encountered in the gastro-intestinal 
tracts (GIT). This gene is also present in the genome of 
the two other dominant human gut methanogens, M, 
smithii and M. stadtmanae [53,54], and could have been 
transferred from other gut bacteria (Additional file 2: 
Figure S4B). Another adaptation to GIT could be inferred 
through the presence of a conserved amino acid domain 
corresponding to COG0790 (TPR repeat, SELl subfamily) 
in at least one protein of each Methanomassiliicoccales 
representative. This conserved domain has been previously 
identified in proteins involved in interactions between 
bacteria and eukaryotes and was never reported in 
archaea [55] suggesting an adaptation to digestive tracts 
unique to Methanomassfliicoccales among archaea. In 
that case, the occurrence of the genes encoding proteins 
with this domain in the Methanomassiliicoccales genomes, 
28 in ''Ca. M. alvus", 6 in "C<2. M. intestinalis" and one in 
M, luminyensis would support a higher adaptation of ''Ca, 
M. alvus" to digestive tracts. 

Methanogenesis and core enzymes specific to 
methanogens 

It was previously reported that M, luminyensis and ''Ca, 
M. alvus" lack the genes that encode the 6 step Ci -path- 
way leading to methyl-CoM by the reduction of CO2 
with H2 [16]. Our current analysis revealed a similar lack 
of these genes in ''Ca, M. intestinalis" (Figure 4). It also 
reveals that ''Ca, M. intestinaUs" does not harbor the 
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H+orNa+ H+orNa+ H2 

Figure 4 Proposed pathways for methanogenesis and energy conservation in the Metiianomassiliicoccales representatives. The protein 
names are in bold. The predicted pathways and enzymes present in the three Methanomassiliicoccales species are in blue, those absent from "Co. 
M. intestinalis" are in green and those absent from "Ca M. alvus" are in red. MtaA and MtbA are marked with an asterisk to signify that the homologs 
present in the Methanomassiliicoccales are not yet assigned to one or the other enzyme category. "X" refers to the uncharacterized lipid soluble 
electron transporter. The question mark points out that the enzymes involved in the reoxidation of the lipid soluble electron transporter remain to 
be uncovered. See Table 6 and Additional file 1: Table SI 0 for a description of the set of genes involved. 

V J 



genes mtsAB (Figure 4) which code for enzymes likely 
involved in methanogenesis from dimethylsulfide [56]. 
The composition of the methyltransferases involved in 
the H2-dependent methylotrophic methanogenesis from 
the three genomes was partially determined before 
[7,8,16,17] and is compiled in the Additional file 1: Table 
SIO, with their relative genomic position displayed in the 
Additional file 2: Figure S6. 

A pool of genes conserved among all methanogens and 
not found in any other archaea was recently determined 
by Kaster et al [57]. These genes encode the subunits of 
two enzymatic complexes unique and shared by all metha- 
nogens, the methyl-H4MPT: coenzyme M methyltrans- 
ferase (Mtr) and the methyl coenzyme reductase (two 
complexes of isoenzymes Mcr and Mrt), as well as pro- 
teins of unknown function. Being unique to methanogens, 
these uncharacterized proteins likely have an important 
role for methanogenesis and could be directly associated 
to the functioning of Mcr and Mtr [57]. The lack of Mtr 
and the other genes of the C02-reductive pathway in the 



three Methanomassiliicoccales described here, prompted us 
to reevaluate the overall methanogenesis markers. In 
addition to the five genes coding for subunits of the Mtr 
enzymatic complex, two former methanogenesis markers 
(annotated as methanogenesis markers 10 and 14 in the da- 
tabases and belonging to arCOG00950 and arCOG04866, 
respectively) are absent from the three Methanomassiliicoc- 
cales genomes (Table 5). One of these genes (belonging to 
arCOG04866) is present in the vicinity of the operon coding 
for Mtr in Methanosaeta thermophila, Methanobacteriales, 
Methanopyrales and Methanocellales genomes. Its genomic 
position in methanogens encoding Mtr and its absence in 
Methanomassiliicoccales suggests its involvement in the 
functioning of Mtr. Fifteen genes present in the three 
Methanomassiliicoccales genomes have homologues (and/or 
paralogs in the case of atwA and the mcr/mrt operons) con- 
served in all other methanogens and not in other archaea 
and could still be considered as methanogenesis markers 
(Table 5). Interestingly, 13 of these genes, including the mcr 
operon, are clustered on a small genomic portion (-16 Kb) 
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Table 5 Core proteins of methanogenesis 



Annotation 


"Ca. M. alvus" 


"Ca. M. 
intestinalis" 


M. luminyensis 


Distribution 


arCOG 


Nitrogenase molybdenum-iron like protein (NifD-like/NfID) 


ALjIooUjU 


Ar'Mivn 7 c 
AuNz/U 1 J 


WP_01 91 76684.1 




arLUuU4ooo 


UDP-N-acetylmuramyl pentapeptide synthase like protein (MurF-like) 


ALjIooUd I 


ALjNz/U 1 0 


WP_01 91 76685.1 




arLUbUzozz 


Methyl-coenzyme M reductase operon associated like 
protein (McrC-like) 


MVJiOD 1 D/ 




WP_ 


019176790.1 




aiL.wUUjzzD 


Conserved hypothetical protein 


AGI85156 


AGN26012 


WP_ 


019176789.1 


1 


arCOG04904 


CoA-substrate-specific enzyme activase 


AGI85155 


AGN260n 


WP_ 


019176788.1 


1 


arCOG02679 


Conserved hypothetical protein 


AGI85154 


AGN260W 


WP_ 


019176787.1 


1 


arCOG04903 


Conserved hypothetical protein 


AGI85153 


AGN26009 


WP_ 


019176786.1 


1 


arCOG04901 


Peptidyl-prolyl cis-trans isomerase related protein 


AGI85152 


AGN26008 


WP_ 


019176785.1 


1 


arCOG04900 


Methyl coenzyme M reductase operon associated protein (McrC) 


AGI85151 


AGN26006 


WP_ 


019176783.1 


1 


arCOG03225 


Methyl-coenzyme M reductase, component A2 (AtwA) 


AGI85150 


AGN26005 


WP_ 


019176782.1 




arCOG00185 


Methyl coenzyme M reductase, beta subunit (McrB/MrtB) 


AGI85141 


AGN26874 


WP_ 


019176771.1 




arCOG04860 


Methyl coenzyme M reductase, protein D (McrD/MrtD) 


AGI85142 


AGN26873 


WP_ 


019176772.1 




arCOG04859 


Methyl coenzyme M reductase, gamma subunit (McrG/MrtG) 


AGI85143 


AGN26872 


WP_ 


019176773.1 




arCOG04858 


Methyl coenzyme M reductase, alpha subunit (McrA/MrtA) 


AGI85144 


AGN26871 


WP_ 


019176774.1 




arCOG04857 


SH3 fold protein 


AGI85145 


AGN26870 


WP_ 


019176775.1 


1 


arCOG04846 


Conserved hypothetical protein 


AGI85146 


AGN26876 


WP_ 


019176769.1 




arCOG02882 


AIR synthase-like protein 


AGI85549 


AGN26462 


WP 


_01 91 76932.1 


2 


arCOG00640 


Predicted DNA-binding protein containing a Zn-ribbon domain 


AGI84948 


AGN25597 


WP 


_019176187.1 


2* 


arCOG01116 


Methyltransferase related protein (MtxX) 


AGI85117 


AGN26654 


WP 


_019177314.1 


3 


arCOG00854 


Conserved hypothetical protein 


AGI84870 


AGN25885 


WP 


_01 91 78690.1 


3* 


arCOG04893 


Fe-S oxidoreductase, related to NifB/MoaA family 










4* 


arCOG00950 


Conserved hypothetical protein 










4 


arCOG04866 


N5-methyltetrahydromethanopterin: coenzyme 










4* 


arCOG03221 



M methyltransferase, subunit A (MtrA) 

N5-methyltetrahydromethanopterin: coenzyme 
M methyltransferase, subunit B (MtrB) 

N5-methyltetrahydromethanopterin: coenzyme 
M methyltransferase, subunit C (MtrC) 

N5-methyltetrahydromethanopterin: coenzyme 
M methyltransferase, subunit D (MtrD) 

N5-methyltetrahydromethanopterin: coenzyme 
M methyltransferase, subunit E (MtrE) 

Soluble P-type ATPase 

Uncharacterized conserved protein 

Conserved hypothetical protein (putative kinase) 



4 
4 
4 
4 

5 

5^ 



arCOG04867 

arCOG04868 

arCOG04869 

arCOG04870 

arCOG01579 
arCOG04844 
arCOG04885 



Protein accession numbers with the same font (bold, italics or bold-italics) are encoded by genes situated close to each other in their respective genomes. 
*Paralogues. 

''^Related to a bacterial cluster with same conserved domain. 

1, Methanogenesis marker, present in and unique to all sequenced methanogens and not in other archaea. 

2, Present in all sequenced methanogens and less than 5% of other sequenced archaea. 

3, Present in more than 90% of sequenced methanogens including Methanomassiliicoccales and less than 5% of other sequenced archaea. 

4, Absent from the Methanomassiliicoccales but present and unique to all other methanogens. 

5, Absent from the Methanomassiliicoccales but present in more than 90% other methanogens and not in other archaea. 

6, Absent from the Methanomassiliicoccales but present in more than 90% of sequenced methanogens and less than 5% of other sequenced archaea. 



of M luminyensis and ''Ca, M. alvus". At the exception of 
mcrABG and atwA [58], they encode for proteins of un- 
known function. One of these proteins (WP_019176775.1, 
AGN26870, AGI85145), not previously reported as a 



methanogenesis marker, might be associated to the func- 
tioning of Mcr as it is encoded by a gene located directly 
upstream mcrA in the three Methanomassiliicoccales 
genomes. The nifD-like (NflD) gene previously proposed 
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to be involved in the biosynthesis of the coenzyme F430, 
the prosthetic group of Mcr/Mrt, is also present in the 
three genomes [59]. It forms a cluster with a UDP-N- 
acetylmuramyl pentapeptide synthase like gene (Table 5) 
and a nifH-like gene also suggested to be involved in coen- 
zyme F430 biosynthesis. Several uncharacterized proteins 
are shared by almost all methanogens, while present in 
very few other archaea, suggesting a tight relationship 
with methanogenesis (Table 5). This is for example the 
case of a putative methyltransferase MtxX [60] only 
missing in Methanosaeta concilii GP6 (but still present 
as a pseudogene, MCON_2260) among methanogens 
and only present in Ferroglobus placidus DSM- 10642 
among non-methanogens. 

Other genes present in the three genomes are more 
widely distributed than in methanogens but play a crucial 
role in methanogenesis. This is the case of genes required 
for the biosynthesis of the coenzyme M and coenzyme B 
involved in the last step of methanogenesis. Inferred CoM 
biosynthesis uses sulfopyruvate, which originates from 
3-phosphoserine converted to cysteate by a cysteate syn- 
thase and then to sulfopyruvate (ComDE), as observed in 
Methanosarcinales, Methanomicrobiales [61] and Metha- 
nocellales (Additional file 1: Table Sll). An alternative 
pathway takes place in other methanogens, where CoM 
originates from phosphoenolpyruvate and sulfite to pro- 
duce sulfolactate, which is then oxidized [62-64]. These 
steps require the activity of enzymes encoded by the 
comABC genes which are absent in the three genomes, 
similar to what is observed in Methanosarcinales and 
Methanomicrobiales (Additional file 1: Table Sll). 

Energy conservation 

Methanogenesis is coupled to energy conservation through 
the establishment of a proton and/or sodium ion electro- 
chemical gradient across the cytoplasmic membrane that 
drives an archaeal-type AiAq ATP synthase complex to 
form ATP [65]. The genes coding for this complex are 
found in close association with the putative origin of 
replication in the three genomes (Figure 2, Table 6). 
The exergonic reduction of the heterodisulfide CoM-S- 
S-CoB formed by the Mcr complex is a crucial step for 
energy conservation conserved in all methanogens. The 
three genomes harbor at least one copy of hdrA, hdrB 
and hdrC homologues encoding a soluble heterodisul- 
fide reductase (Table 6), HdrB representing the catalytic 
activity for CoM-S-S-CoB reduction. The current HdrA 
differs from its homologues present in other methanogens 
by its longer size and the presence of two predicted 
FAD-binding sites instead of one, and three 4Fe-4S cen- 
ters instead of four. The three genomes also contain 
homologues of hdrD, encoding the catalytic site of a 
second class of heterodisulfide reductase (HdrDE), but 



no homologues of hdrE encoding the membrane bound 
cytochrome subunit of this complex. Similarly to the 
Methanococcales, Methanobacteriales and Methanopyrales, 
the hdrB and hdrC genes are adjacent whereas the hdrA 
gene is located apart and in close association with mvhDGA 
encoding the cytoplasmic F42o-non-reducing hydrogenase, 
absent from members of the Methanosarcinales and some 
Methanomicrobiales [66]. MvhA contains the Ni-Fe do- 
main for activation of H2. MvhADG and HdrABC were 
shown to form a complex that couples the reduction of 
CoM-S-S-CoB and a ferredoxin with H2 through a flavin- 
based electron bifurcation in Methanothermobacter mar- 
burgensis [67]. Presence of MvhADG and HdrABC in the 
three Methanomassiliicoccales representatives suggests a 
similar process (Figure 4). Energy conservation may likely 
result from the subsequent reoxidation of ferredoxin 
coupled to translocation of H^ (or possibly Na^) across the 
membrane by a membrane associated enzymatic complex 
(Figure 4), as proposed by Thauer et al [68] for M, stadt- 
manae. However the Ehb complex likely responsible for 
the translocation Na"^ in M. stadtmanae is not present in 
the three Methanomassiliicoccales representatives. 

The only identified complex shared by the three genomes 
which could fulfil this role corresponds to the 11-subunits 
respiratory complex I found in a large number of archaea 
and bacteria [69]. This complex is homologous to the 
Fpo complex (F420H2 dehydrogenase) of Methanosarcinales 
[70]. Characterized respiratory complex I and Fpo catalyze 
the exergonic transfer of electrons from a cytoplasmic 
electron transporter to a membrane soluble electron 
transporter coupled to the translocation of ions across 
the membrane [69,70]. A similar process in Methano- 
massiliicoccales would thus imply a membrane associated 
electron transport chain which was so far only observed in 
Methanosarcinales among methanogens. The currently 
predicted enzymatic complex is truncated as compared to 
the Fpo of Methanosarcina spp. with the lack of homo- 
logues of the FpoO and FpoF subunits, forming an 
FpoABCDHIJKLMN like complex (Figure 4, Table 6). The 
lack of the FpoF subunit is similar to the Fpo complex of 
Methanosaeta representatives which were proposed to 
use ferredoxin instead of F420H2 as electron donor [71] 
(Table 6). The three genomes also harbor genes required 
for biosynthesis of a liposoluble electron transporter 
(Additional file 3, Table 6), whose role may be to accept 
electrons from the Fpo complex [72]. This membrane- 
soluble electron carrier, whose biochemical nature has 
to be determined experimentally, would drive electron 
transfer in the membrane, linking the Fpo complex to 
another membrane bound protein/complex, possibly a 
second coupling site reducing the heterodisulfide. The 
energy-converting hydrogenase EchA-F is another mem- 
brane enzymatic complex which could also translocate 
ions by the re-oxidation of the ferredoxin [73] but it only 
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Table 6 Genes involved In energy conservation In "Ca. M. alvus", "Ca. M. Intestlnalls" and M. luminyensis and accession 
numbers of the proteins they encode 

"Ca. M. alvus" "Ca. M. intestinalis" M. luminyensis Transmembrane helices 

ATP synthase 



ahaH 


AGI84762.1 


AGN25422.1 


WP_01 


9178382.1 


no 


ahal 


AGI84763.1 


AGN25423.1 


WP_01 


9178381.1 


yes 


ahoK 


AGI84764.1 


AGN25424.1 


WP_01 


9178380.1 


yes 


ahoE 


AGI84765.1 


AGN25425.1 


WP_01 


9178379.1 


no 


ahaC 


AGI84766.1 


AGN25426.1 


WP_01 


9178378.1 


no 


ohoF 


AGI84767.1 


AGN25427.1 


WP_01 


9178377.1 


no 


ohaA 


AGI84768.1 


AGN25428.1 


WP_01 


9178376.1 


no 


ohoB 


AGI84769.1 


AGN25429.1 


WP_01 


9178375.1 


no 


ohaD 


AGI84770.1 


AGN25430.1 


WP_01 


9178374.1 


no 


Membrane-bound proton-translocating pyrophosphatase 










hppA 


/ 


AGN26077.1 


WP_01 


9176822.1 


yes 


Heterodisulfide reductase 










hdrA 


AGI85054.1 


AGN25863.1 


WP_01 


9177460.1 


no 


hdrBl 


AGI86093.1 


AGN25718.1 


WP_01 


9177711.1 


no 


hdrB2 


AGI85474.1 


AGN25916.1 


WP_01 


9176125.1 


no 


hdrCl 


AGI86094.1 


AGN25719.1 


WP_01 


9177712.1 


no 


hdrC2 


/ 


/ 


WP_01 


9176126.1 


no 


hdrDl 


AGI86375.1 


AGN25510.1 


WP_01 


9178460.1 


no 


hdrD2 


AGI86212.1 


AGN25649.1 


WP_01 


9177852.1 


no 


hdrD3 


/ 


/ 


WP_01 


9177557.1 


no 


hdrE 


/ 


/ 




/ 


/ 


Methyl-viologen 


-reducing hydrogenase 










mvhDl 


AGI85055.1 


AGN25864.1 


WP_01 


9177459.1 


no 


mvhD2 


/ 


AGN25453.1 


WP_01 


9176201.1 


no 


mvhD3 


/ 


/ 


WP_01 


9176130.1 


no 


mvhG 


AGI85056.1 


AGN25865.1 


WP_01 


9177458.1 


no 


mvhA 


AGI85057.1 


AGN25866.1 


WP_01 


9177457.1 


no 


F420H2 dehydrogenase-like/1 1-subunit respiratory complex 1 








fpoA 


AHA34030.1 


AGN25601.1 


WP_01 


9176183.1 


yes 


fpoB 


AGI84952.1 


AGN25602.1 


WP_01 


9176182.1 


no 


fpoC 


AGI84953.1 


AGN25603.1 


WP_01 


9176181.1 


no 


fpoD 


AGI84954.1 


AGN25604.1 


WP_01 


9176180.1 


no 


fpoF 


/ 


/ 




/ 




fpoH 


AGI84955.1 


AGN25605.1 


WP_01 


9176179.1 


yes 


fpol 


AGI84956.1 


AGN25606.1 


WP_01 


9176178.1 


no 


fpoJN 


AGI84957.1 


AGN25607.1 


WP_01 


9176177.1 


yes 


fpoJc 


AGI84958.1 


AGN25608.1 


WP_01 


9176176.1 


yes 


fpoK 


AGI84959.1 


AGN25609.1 


WP_01 


9176175.1 


yes 


fpoL 


AGI84960.1 


AGN25610.1 


WP_01 


9176174.1 


yes 


fpoM 


AGI84961.1 


AGN25611.1 


WP_01 


9176173.1 


yes 


fpoN 


AGI84962.1 


AGN25612.1 


WP_01 


9176172.1 


yes 
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Table 6 Genes involved in energy conservation in "Ca, M. alvus", "Ca, M. intestinalis" and M. luminyensis and accession 
numbers of the proteins they encode (Continued) 



fpoO 


1 


/ 




/ 




Energy-converting hydrogenase 










echAl 


1 


AGN25511.1 


WP_ 


_01 91 78471.1 


yes 


echA2 


1 


AGN26997.1 


WP_ 


_01 91 76386.1 


yes 


echBl 


1 


AGN25512.1 


WP_ 


_01 91 78472.1 


yes 


echB2 


1 


AGN26998.1 


WP_ 


_01 91 76385.1 


yes 


echCl 


1 


AGN25513.1 


WP_ 


_01 91 78473.1 


no 


echC2 


1 


AGN26999.1 


WP_ 


_01 91 76384.1 


no 


echDl 


1 


AGN25514.1 


WP_ 


_01 91 78474.1 


no 


echD2 


1 


AGN27000.1 


WP_ 


_01 91 76383.1 


no 


echEl 


1 


AGN25515.1 


WP_ 


_01 91 78475.1 


no 


echE2 


1 


AGN27001.1 


WP_ 


_01 91 76382.1 


no 


echFl 


1 


AGN25516.1 


WP_ 


_01 91 78476.1 


no 


echF2 


1 


AGN27002.1 


WP_ 


.019176381.1 


no 


Liposoluble electron transporter synthesis 










/sp/\' 


AGI84964.1 


AGN25614.1 


WP_ 


,0191761 70.1 


/ 


ubiA^ 


AGI85875.1 


AGN26416.1 


WP_ 


_01 91 78349.1 


/ 






AGN26109.1 








ubiE' 


AGI85874.1 


AGN26417.1 


WP_ 


_01 91 78072.1 


/ 






AGN25541.1 


WP_ 


,0191781 98.1 










WP_ 


_01 91 76998.1 





^encoding a geranylgeranyl pyrophosphate synthase (GGPPS). 
"^encoding a1,4-dihydroxy-2-naphthoate octaprenyltransferase (DHNOPT). 
'^encoding a 2-heptaprenyl-1,4-naphthoquinone methyltransferase (HPNQMT). 



occurs in M. luminyensis and ''Ca. M. intestinalis" 
(Figure 4). Nevertheless EchA-F could also operate in 
reverse and exploit the chemosmotic gradient for anabolic 
reactions [74]. Finally, a gene encoding a membrane- 
bound pyrophosphatase is found in the genomes of M. 
luminyensis and ''Ca. M. intestinalis" (Table 6) but not in 
"Ca. M. alvus". This protein is predicted to allow the 
translocation of protons across the cytoplasmic membrane 
by hydrolysis of PPi to phosphate [75,76] . 

The three genomes share an original combination of 
genes likely involved in energy conservation, suggesting 
a different process than what is observed in other metha- 
nogens. The predicted flavin-based electron bifurcation in 
MvhADG/HdrABC complex is a feature shared by most 
methanogens with the exception of Methanosarcinales 
and some Methanomicrobiales representatives, while the 
putative membrane associated electron transport chain 
related to the activity of the Fpo-like complex was so far a 
unique feature of Methanosarcinales among methanogens. 
However, no membrane-bound cytochrome protein like 
those of the Methanosarcinales was detected to be 
encoded by the three genomes and the complete process 
remains to be uncovered. 



Amber codon usage and putative Pyl-containing proteins 

Previous studies have shown that the genes coding for 
methyhcorrinoid methyltransferases B dedicated to 
methylamines utilization (mtmB, mtbB and mttB for 
mono-, di- and tri-methylamines, respectively) present 
in M. luminyensis, "Ca. M. intestinalis" and "Ca. M. alvus" 
contain an in-frame amber Pyl-encoding codon [7,8,25], 
similarly to what is observed in Methanosarcinaceae and 
in a few bacteria [77,78], where it encodes the 22^*^ pro- 
teogenic amino acid pyrrolysine (Pyl, O). All the neces- 
sary genetic machinery is found in the three 
Methanomassiliicoccales genomes, including the genes 
for pyrrolysine synthesis (pylBCD), the amber suppressor 
tRNA^^^ (pylT) and the dedicated amino-acyl tRNA 
synthetase (pylS) [25]. The presence of decoding amber 
machinery questions the occurrence of Pyl in other pro- 
teins than the methyltransferases involved in methylo- 
trophic methanogenesis. This possibility was addressed in 
the present study by searching all the TAG-interrupted 
CDS which share the same BLASTP hit with the virtual 
in-frame translation of the 3' flanking region. These CDS 
were fused in silico as a unique CDS, stopping at the next 
stop codon and predicted as potentially incorporating Pyl 
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during the translation process. As a positive control, this 
strategy identified the above-mentioned methylamines: 
corrinoid methyltransferases in the three genomes. No 
putative other Pyl-containing proteins were identified 
in M. luminyensis. One additional amber-cont^ming CDS 
was determined in ''Ca, M. intestinalis", a putative Fe-S 
binding protein (AGY50215), which is absent in ''Ca, M. 
alvus" and present in M, luminyensis but not predicted 
to incorporate Pyl. ''Ca, M. alvus" contains the highest 
number of predicted Pyl-containing proteins, 16 in addition 
to the methylamines: corrinoid methyltransferases (Table 7, 
Figure 5). Half of them have homologues in the two 
other genomes but without in-frame amber codons (in 
bold. Table 7). Among these 16 proteins, several have a 
hypothetical function and some are highly conserved in 
methanogens and/or archaea. This is the case of a digera- 
nylgeranylglyceryl phosphate synthase required in the 
synthesis of archaeal phospholipids and of the putative 
methyltransferase MtxX (Tables 5 and 7). The CRISPR 
associated casl gene, although present in the three 
genomes, is only detected as a Pyl-containing enzyme 
in ''Ca, M. alvus". The activity and the effective incorp- 
oration of Pyl in such a large range of enzymes of the 
same organisms remain to be determined experimentally. 
However, this could reasonably be assumed considering 



the existence of few functional Pyl-containing proteins 
(different of methylamines: corrinoid methyltransferases) 
reported from both Pyl-decoding archaea and bacteria 
[77,79,80]. 

Particular genetic signals in the genes containing an 
in-frame TAG have been proposed to enhance the in- 
corporation of Pyl in the proteins but are not obligatorily 
requested for that purpose [81]. Two alternative adapta- 
tions have been proposed for Methanosarcina spp. and the 
bacteria Acetohalobium arabaticum to minimize proteome 
alteration in consequence of the insertion of Pyl on the 
stop codons normally intended to stop the translation [77]. 
In A, arabaticum the expression of the Pyl-cassette has 
been shown to be regulated by substrate (trimethylamine) 
availability, while in Methanosarcina spp. which constitu- 
tively express the Pyl-cassette [79,82], the frequency of 
genes ended by a TAG stop codon is minimized (-4-5% in 
Methanosarcina spp. V5. 20-30% in A, arabaticum and 
other Pyl-decoding bacteria, see Additional file 1: Table 
S12 adapted from [77]). Accordingly, the extremely low 
frequency of TAG stop codons in ''Ca, M. alvus" (1.6%) 
suggests a constitutive expression of the Pyl-cassette and 
an efficient ability to incorporate Pyl in proteins (Figure 5, 
Additional file 1: Table S12). In such tRNA^^^ suppressmg 
context, the apparition of an in-frame amber codon in a 



Table 7 Putative Pyl-containing proteins in "Ca. M. alvus" 



Accession number 


Annotation 


Size^ 


Comments 


AGI84833.1 


hypothetical protein 


253 


DPM synthase like/GT2 superfamily 


AGI85009.1 


hypothetical protein 


270 


digeranylgeranylglyceryl phosphate synthase 


AGI85117.2 


phosphotransacetylase-like protein 


242 


putative methyltransferase MtxX 


AGI85168.2 


filamentation induced by cAMP protein Fic 


425 




AGI85186.1 


hypothetical protein 


149 


Rv0623-like transcription factor 


AGI85280.1 


hypothetical protein 


917 


glycosyltransferase family 29 


AGI85290.1 


hypothetical protein 


148 




AGI85300.1 


hypothetical protein 


444 


ATPase domain 


AGI85437.1 


hypothetical protein 


536 


prophage Lp3 protein 8 (helicase) of Lactobacillus spp. 


AGI85443.1 


hypothetical protein 


717 




AGI85449.1 


hypothetical protein 


262 


putative methyltransferase 


AGI85596.1 


hypothetical protein 


162 


putative acetyltransferase 


AGI85630.1 


hypothetical protein 


322 


CRISPR- associated endonuclease casl 


AGI85862. 1 


MMAcorrinoid methyltransferase 


459 




AGI85863. 1 


MMA:corrinoid methyltransferase 


461 




AGI85869. 1 


TMAcorrinoid methyltransferase 


504 




AGI85870. 1 


DMA:corrinoid methyltransferase 


469 




AGI86303.1 


hypothetical protein 


389 


Sel-1 domain containing protein 


AGI86346.1 


transporter family protein 


289 


bacterial/archaeal transporter family protein 


AGI86379.1 


uncharacterized protein 


187 


conserved in archaea (DUF531) 



Proteins in bold indicate homologs In the two other members of the Methanomasslliicoccales, devolded of Pyl. Proteins In Italics Indicate homologs In the two 
other members of the Methanomasslliicoccales also containing Pyl. 
^Number of amino acids. 



Borrel et al. BMC Genomics 2014, 15:679 
http://www.bionnedcentral.conn/1471 -21 64/1 5/679 



Page 1 7 of 23 




"Ca. M. intestinalis" 




M. luminyensis 11 




■ % of predicted ■ Number of predicted Pyl-containing 

amber stop codons proteins (witliout IVItmB/IVItbB/IVIttB) 

Figure 5 Comparison of tlie number of putative Pyl-containing proteins (other than MtmB/MtbB/MttB methyltransferases) and of the 
percentage of amber codons used as translational stop signals deduced from the three Methanomassiliicoccales genomes. 



CDS would lead to a stable mutation as supported by 
the high occurrence of genes predicted to encode Pyl- 
containing proteins in "Ca. M. alvus". The phylogenetic 
position of "Ca. M. alvus" among a large cluster of gut 
methanogens suggests a long evolutionary history in this 
type of environments where mono- di- and trimethy- 
lamine are likely not Umiting [17,83,84] and may be 
obtained through the degradation of glycine betaine, 
choline and L-carnitine by co-occurring microorganisms 
[85-87]. This high availability of methylamines during the 
evolution of "Ca. M. alvus", involving a possibly high 
and constant expression of the Pyl-machinery, could 
have been a driving factor that has led to this particularly 
low usage of the triplet TAG in CDSs as termination 
signals during translation. In addition, the insertion of an 
amber codon in a gene coding for a protein of major 
function (such as the highly conserved MtxX, Casl or the 
digeranylgeranylglyceryl phosphate synthase in the present 
case) might have turned the expression of the Pyl cassette 
and the efficient ability to incorporate Pyl essential for 
growth. As a feedback this would contribute to tight the 
association of "Ca. M. alvus" cluster methanogens with 
digestive tract environments. The absence of predicted 
Pyl-encoding proteins other than MtmB, MtbB and MttB 
and the high frequency of genes ended by TAG (11.3%) in 
M. luminyensis (Figure 5, Additional file 1: Table S12) 
argue for a different handling of the Pyl-encoding capacity, 
possibly through a more important regulation of Pyl- 
incorporation, and could reflect an adaptation to lower or 
more variable availability in methylamines [88]. Together 
with other genomic traits described above, this supports a 



larger distribution of M. luminyensis than digestive tract 
environments. Following the hypothesis of a methylamine- 
directed selective pressure on TAG usage in CDSs of the 
Methanomassiliicoccales, the intermediate TAG usage in 
the CDSs of "Ca. M. intestinalis" (Figure 5, Additional file 1: 
Table S12) would reflect a more stringent association to 
digestive tracts compared to M. luminyensis. 

Conclusions 

Several atypical features were identified in the three 
genomes such as the scattering of the ribosomal RNA 
genes and the absence of eukaryotic-like histone gene 
otherwise present in most of Euryarchaeota genomes. 
The lack of the eukaryotic-like histone gene could repre- 
sent an ancestral loss of the overaU branch composed by 
Thermoplasmatales and related lineages, replaced by 
bacterial-type histone in Thermoplasmatales or Alba 
protein present in all genomes of the branch with the 
exception of "Ca. M. intestinalis" and "Ca. M. alvus". 
Intriguingly, the nature of this protein remains elusive 
in ''Ca. M. intestinalis" and "Ca. M. alvus". 

The absence of a large number of genes otherwise 
present in all methanogens, but not aU restricted to 
methanogens, was previously reported in M. luminyensis 
and "Ca. M. alvus" genomes and is presently extended 
to "Ca. M. intestinalis". The large lack of these genes in- 
volved in the CO2 reduction/methyl-oxidation pathways 
in other methanogens offers a unique context to redefine 
the genes encoding enzymes or isoenzymes shared by all 
and only methanogens. Interestingly, the reevaluation 
shows that this core is not deeply changed when 
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Methanomassiliicoccales are considered. In addition to 
the genes encoding the Mtr complex, only two of these 
methanogenesis marker genes are absent from the 
Methanomassiliicoccales genomes. Gathered with mcrABG 
on a small genomic portion in M. luminyensis and ''Ca, M. 
alvus", core genes encoding uncharacterized proteins could 
be intimately involved in the functioning of the Mcr 
complex. The process of energy conservation associated 
to methanogenesis on methyl-compound reduction with 
H2 was analyzed. The original composition of genes pres- 
ently identified to take part to this process suggests the 
involvement of a flavin-based electron bifurcation and a 
membrane associated electron transport chain which are 
distinctive elements of the two main energy conservation 
processes defined in other methanogens. However the 
complete process remains to be uncovered and several 
components have to be characterized. 

While the three Methanomassiliicoccales representa- 
tives were cultured from gastrointestinal tract, the analysis 
of their genome revealed differential adaptations to this 
environment and possibly contrasted evolutionary history. 
One of the striking differences among the three species 
relies on their usage of the TAG codon which could 
have been shaped by the availability of methylamines as 
a substrate during their evolution. The long term adap- 
tation of ''Ca, M. alvus" to GIT environments, suggested 
by its position among a large cluster of GIT-derived 
sequences, is supported by its gene composition, along 
with lateral gene transfer from GIT-associated bacteria. 
The phylogenetic position of M. luminyensis and ''Ca, 
M. intestinalis" among soil and sediment methanogens 
suggests a more recent adaptation or more facultative 
association to GIT conditions. Consistent with this hypoth- 
esis, the M, luminyensis genome contains several important 
genes which are specifically present in soil and sediment 
methanogens. Although phylogenetically close to M, lumi- 
nyensis, ''Ca, M. intestinalis" has a reduced genome with a 
lower [G + C] % and does not share the signatures of soil or 
sediment adaptations of M, luminyensis. These differences 
could reflect a phenomenon of streamlining in the ''Ca. M. 
intestinalis" genome linked with its adaptation to GIT 
conditions. A similar phenomenon was previously reported 
from free-living bacteria [29] and with more extreme ampli- 
tude, in obligate pathogens [89] as well as in bacterial [90] 
and archaeal [91] symbionts. 

Methods 

Gene structure prediction 

Complete genome sequences of ''Ca, M. alvus" [GenBank: 
NC_020913.1] and "Ca. M. intestinalis" [GenBank: NC_02 
1353.1] were obtained from enriched consortia of stool- 
derived cultures from a 91 -year-old woman, with an 
average genome sequence coverage respectively of 36.9 
fold and 42.7 fold [7,8]. Genomic sequences from 



Methanomassiliicoccus luminyensis BIO were retrieved 
from the Genbank database [GenBank: CAJEOIOOOOOI- 
CAJE01000026]. Raw sequences from ''Ca. M. alvus" 
Mxl201, ''Ca. M. intestinalis" Mxl-Issoire and M. lumi- 
nyensis were fed to the RAST Annotation server [92] using 
Glimmer3 [93] for open-reading frames prediction. The 
RAST Annotation used the released 59 of FIGfam and 
no frameshifts fixing parameters. To perform an accurate 
structural annotation of these genomes, a comparative 
analysis of the ''Ca, M. alvus", ''Ca, M. intestinalis" and M, 
luminyensis annotated proteomes was conducted using 
the TBLASTN program. To identif)^ genes or distantly re- 
lated genes, a BLOSUM45 substitution matrix was chosen, 
and low-complexity filters were suppressed. TBLASTN 
analyses were manually validated to take into account 
genes with frame-shifts due to sequencing errors. Transla- 
tion start codons were then validated through a BLASTP 
comparative analysis of the three annotated proteomes. 
Protein sequences from the three proteomes were com- 
pared together with the curated SWISS-PROT protein 
sequences database [94]. Results were filtered using 80% 
length and 40% identity thresholds and start codons were 
manually corrected taking into account protein sizes and 
local alignments. Non-coding RNAs were predicted using 
the Rfam database [95] with an E-value threshold of 1 
and results were manually curated. Additional analyses 
were performed to detect tRNAs by merging results 
from tRNAscan [96], TEAM [97], ARAGORN [98] and 
BLASTN [99]. CRISPREinder [100] was appUed for 
each of the three genomes to detect CRISPR loci that 
were compared together using CRISPRcompar [101] and 
CRISPRmap [102]. Einally, prophages were sought using 
PHAST [103]. Circular representation of the ''Ca, M. 
alvus" and ''Ca, M. intestinalis" genomes were performed 
using the CGView Server [104]. 

Comparative genome analysis and functional annotation 

An all-ver5W5-air BLASTP comparison of the predicted 
protein sequences within each of the three genomes was 
conducted [99]. On the basis of the best BLASTP hits, 
orthologous relationships were established between the 
protein sequences of ''Ca, M. alvus", ''Ca, M. intestinalis" 
and M, luminyensis, A Venn diagram was then drawn 
using the Venny web service [105]. Predicted functions 
provided by the RAST annotation server for each CDS 
of the three species were kept as functional annotation. 
Using orthology relationships previously established, a 
functional annotation transfer was performed. Protein 
sequences of genes with frame-shift mutations were 
manually reconstructed. In order to distinguish protein 
sequences only found within the three genomes and 
shared protein sequences with closely related species, a 
BLASTP analysis was conducted. Each protein sequence 
from the core proteome was compared to i) phylogenetic 
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neighbors proteomes {Aciduliprofundum boonei T469, ac- 
cession code: NC_013926; Aciduliprofundum sp, MAR08- 
339, accession code: NC_019942; Ferroplasma acidarmanus 
ferl, accession code: CM000428; Thermoplasma acidophi- 
lum DSM 1728, accession code: NC_002578; Thermo- 
plasma volcanium DSSl, accession code: NC_002689 and 
MG-II, accession code: CM001443), ii) methanogenic 
archaeon from human gut {Methanobrevibacter smithii 
ATCC 35061, accession code: NC_009515 and Metha- 
nosphaera stadtmanae DSM 3091, accession code: 
NC_007681) and iii) the NCBI non-redundant protein 
sequences database (release 12/2012). Identity threshold 
was set at 30% with a minimum length coverage of 80%. 
An arCOG [45] analysis was also performed using the 
December 2012 release (ftp://ftp.ncbi.nih.gov/pub/wolf/ 
COGs/arCOG/). Each annotated protein sequence from 
the three genomes was compared to the arCOG database 
using BLASTP and an E-value threshold equal to le"^. 
The arCOG profiles of the three genomes and those of 
the arCOG database were used to identify proteins 
potentially shared by all and only methanogens, as well 
as proteins almost specific to methanogens and shared 
by almost all methanogens. Distribution of each selected 
protein among sequenced organisms was checked by 
BLASTP. Conserved domains of the selected proteins 
were compared to those of the closest results that belong 
to non-methanogens and phylogenetic three were con- 
structed to verify their monophyly. Additional proteomes 
from various archaeal orders were also submitted to this 
comparison: A, boonei T469; Archaeoglobus fulgidus DSM 
4304, accession code: NC_013926; Archaeoglobus venefi- 
cus SNP6, accession code: NC_015320; M. smithii ATCC 
35061; M, stadtmanae DSM 3091; Thermoplasma acid- 
ophilum DSM 1728, accession code: NC_002578 and 
MG-II. In order to detect putative lateral gene transfers, 
the same BLASTP analysis was performed for the three 
proteomes using the UniprotKB database [106]. Only best 
hits were retrieved and classified according to the three 
domains of life: Archaea, Bacteria or Eukaryota. The 
genomes of the Methanomassiliicoccales representatives 
were not included in the subject database. Metabolic 
pathways reconstruction was performed through the 
KEGG Automatic Annotation Server (KAAS) [107] using 
a bi-directional best hit strategy and a custom list of refer- 
ence organisms. Indeed, based on best BLAST hit results 
from the three proteomes, 40 species were selected for the 
KAAS (three-letter organism codes are listed as follows: 
abi, mac, tac, mba, rci, mig, afu, mpd, tba, mpi, pab, mka, 
pho, mhu, mja, mla, mth, cdc, amt, drm, mbn, ssg, ele, 
fnu, mel, mrv, fsv, tsi, Iba, ral, sti, msi, see, eco, ere, aas, 
eha, sfu, bla, cau). The transportome was determined 
using the TransportTP server [108] (reference organism: 
Escherichia coli; E-value threshold: 0.1). Results were 
manually validated and curated using BLASTP analysis 



using transportDB [109] and taking into account orthol- 
ogy relationships. Signal peptides, transmembrane helices 
and PFAM domains were predicted through the Inter- 
ProScan annotation module provided by the BLAST2GO 
software [110] with default parameters. 

Phylogenomic analysis of DNA replication components 

Homologs of each major archaeal DNA replication compo- 
nent were retrieved from the reference sequence database 
at the NCBI using the BLASTP program with different 
seeds from each archaeal order [99]. The top 100 best hits 
for each order were then used to create HMM profiles 
[111] (http://www.hmmer.org) that allowed iteratively 
searching a local database of 142 complete or nearly com- 
plete archaeal genomes including 98 plasmid sequences, as 
well as in a local database of the available complete archaeal 
virus genomes (56 total) downloaded from the Viral 
Genomes database of NCBI (as of June 20^^ 2013). Ab- 
sences of a given homolog in a specific genome were veri- 
fied by performing additional TBLASTN searches [99]. 
Multiple alignments were done with MUSCLE 3.8.31 [112] 
and manually inspected using the ED program from the 
MUST package to remove non-homologous or partial se- 
quences [113]. The alignments were trimmed using the 
software BMGE [114] with default parameters. Phylogenetic 
analyses were performed on single protein datasets using 
Maximum Likelihood and Bayesian methods. Maximum 
likelihood analyses were performed with RaxML [115]. 
Mr. Bayes 3.2 [116] was used to perform Bayesian analyses 
using the mixed amino acid substitution model and four 
categories of evolutionary rates. Two independent runs 
were performed for each data set, and runs were stopped 
when they reached a standard deviation of split frequency 
below 0.01 or the log likelihood values reached station- 
arity. The majority-rule consensus trees were obtained 
after discarding first 25% samples as 'burn-in'. 

Data access 

The whole genome shotgun projects, the complete 
genome sequences and annotations have been deposited 
at DDBJ/EMBL/GenBank for "C^. M. alvus" Mxl201 
[GenBank: CP004049] and for ''Ca, M. intestinalis" 
Issoire-Mxl [GenBank: CP005934]. Predicted CDS and 
protein sequences for M, luminyensis, some of which 
are not annotated in GenBank are provided respectively 
through Additional file 1: Tables S12 and S13. 

Additional files 



Additional file 1: Additional tables in a zipped folder containing: 
Table 51. tRNA and ncRNA contents for the genomes of the three 
Methanomassiliicoccales representatives. Table S2. Codon usage in the 
three genomes of Methanomassiliicoccales. Table S3. CRISPR DR 
elements found in the three genomes. Table S4. Number of best hits 



Borrel et al. BMC Genomics 2014, 15:679 
http://www.bionnedcentral.conn/1471 -21 64/1 5/679 



Page 20 of 23 



score among the three domains of life. Table S5. Genes list of the core 
genome of the Methanomassiliicoccales, as deduced by a TBLASTN 
analysis (with reference to CDS of "Ca. M. alvus" genome), and their 
presence or not in phylogenetical neighbors, human gut 
Methanobacteriales and non-redundant genbank DB. Table S6. CDS list of 
the core genome of the Methanomassiliicoccales, absent in phylogenetical 
neighbors and the human gut Methanobacteriales. In blue, the 20 CDS not 
retrieved in genbank database. Table S7. arCOG distribution among the 
Methanomassiliicoccales representative genomes, gut methanogens and 
some other archaea. Table S8. Complete list of transporters detected 
by TransportDB, in the three genomes of Methanomassiliicoccales. 
Table S9. List of the antioxydant systems in the three genomes of 
Methanomassiliicoccales. Table S10. Genes involved in methanogenesis in 
"Ca. M. alvus", "Ca. M. intestinalis" and M. luminyensis and accession numbers 
of the proteins they encode. Table S1 1 . Comparative presence of the 
genes involved in the synthesis of the coM among the seven orders of 
methanogens. Table SI 2. Numbers of CDS with in-frame TAG, and % 
of the total CDS in various genomes of microorganisms coding or not 
pyrrolysine (update information from Prat et al. [77]). Table SI 3. CDS list of 
M. luminyensis BIO. Table SI 4. Proteome of M. lunninyensis BIO. 

Additional file 2: Additional figures in a zipped folder containing: 
Figure SI. CRISPR Direct Repeats structure. The figure shows the 2D, 
Minimum Free Energy structure of CRISPR DRs retrieved from the three 
genomes of the Methanomassiliicoccales (using RNAfold web server 
[117]) and the sequence alignment of M. lunninyensis DR with the family 
3, motif 27 DRs (using CRISPRmap [34]). Figure S2. Chromosome circular 
maps of (A) "Candidatus Methanomethylophilus alvus" Mx1201 and (B) 
"Candidatus Methanomassiliicoccus intestinalis" Mxl-lssoire genomes 
(generated with CGView [104]). Circles display from outside: 1 and 4, rRNA 
genes respectively on forward and reverse strand; 2 and 3, CDS on forward 
and reverse strand; 5, BLAS^ results with a maximum expected value of 
1e"^ versus the "Ca. M. intestinalis" proteome; 6, [G + C] % content deviation 
from the average [G + C] % content of the genome. Arrows, location and 
sense of the orcl/cdc6 genes. Figure S3. Phylogeny of Cdc6/0rc1 proteins. 
Figure S4. Phylogenetic trees of NAD-dependent DNA ligase (A) and 
Choloyglycine hydrolase (B) genes likely transferred from bacteria to "Ca. 
M. alvus". In red, sequences of "Ca. M. alvus", in blue sequences from other 
gut-associated methanogens. Figure S5. Metabolic comparison of the three 
genomes based on KEGG maps. Series of three boxes represent presence or 
absence of the E.G. numbered enzyme (yellow for "Ca. M. alvus", green for 
"Ca. M. intestinalis" and blue for M. lunninyensis). Green arrows replace 
complex pathways. Blue boxes, synthetized compounds by the 3 species; 
Red boxes, compounds not synthetized by the three species. Orange 
boxes, compounds synthetized by at least 1 species. Question marks 
show pathways where there is at least one enzyme missing. Figure S6. 
Comparison of the physical map of genes involved in methanogenesis 
on methyl compounds + H2 in the three analyzed genomes. 

Additional file 3: Additional Data in a zipped MS Word file. Details 
on lipids, amino acids and purine synthesis, as well as molecular nitrogen 
fixation deduced from the genomes of the three members of the 
Methanomassiliicoccales. 
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Synthase; DHNOPT: 1,4-dihydroxy-2-naphthoate octaprenyltransferase; 
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EndoNuclease 1; GGPS: (S)-3-0-geranylgeranylglyceryl phosphate synthase; 
GINS: Go-lchi-Nii-San protein; GIT: Gastro-intestinal tract; GT: Glycosyl Transferase; 
H4MP: Tetrahydromethanopterin; Hec: Unknown hydrogenase, probable 
energy-converting; HPNQMT: 2-heptaprenyl-1,4-naphthoquinone 
methyltransferase; IPPK: Isopentenyl phosphate kinase; LGT: Lateral Gene 
Transfer; LSU: Large subunit; MCM: Minichromosome maintenance protein; 
MG-II: Uncultured Marine Group II; NCAIR: N5-5-amino -4-imidazole carboxylic 
acid ribonucleotide; nr: Non-redundant; ORB: Origin recognition box; ORE: Open 
Reading Frame; Ori: Origin of replication; PCNA: Proliferating Cell Nuclear 



Antigen; PFOR: Pyruvate:ferredoxin oxydoreductase; 

PMDC: Phosphomevalonate decarboxylase; PMK: Phosphomevalonate kinase; 
PPS: Polyprenyl synthetase; PriL: Primase large subunit; PriS: Primase small 
subunit; PRPP: Phosphoribosyl pyrophosphate; Pyl: Pyrrolysine; RCC: Rumen 
cluster C; RFC: Replication Factor C; RNaseH: Ribonuclease H; RPA: Replicative 
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